Longer Features: They do a speech detector good

نویسندگان

  • T. J. Tsai
  • Nelson Morgan
چکیده

We have incorporated spectrotemporal features in a speech activity detection (SAD) task for the Speech in Noisy Environments 2 (SPINE2) data set. The features were generated by applying 2D Gabor filters to the mel spectrogram in order to measure the strength of various spectral and temporal modulation frequencies in different patches of the spectrogram. Using several different back-ends, the Gabor features significantly outperformed MFCCs, yielding relative reductions in equal error rate (EER) of between 40 and 50%. Compared to the other backends, Adaboost with tree stumps performed particularly well with Gabor features and particularly poorly with MFCCs. An investigation into the reasons for this disparity suggests that the most useful features for SAD incorporate information over longer time scales.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and realisation of an audiovisual speech activity detector

For many speech telecommunication technologies a robust speech activity detector is important. An audio-only speech detector will give false positi-ves when the interfering signal is speech or has speech characteristics. The modality video is suitable to solve this problem. In this report the approach to and implementation of a decision-based audiovisual speech detector is given. Acoustic and v...

متن کامل

A New Algorithm for Voice Activity Detection Based on Wavelet Packets (RESEARCH NOTE)

Speech constitutes much of the communicated information; most other perceived audio signals do not carry nearly as much information. Indeed, much of the non-speech signals maybe classified as ‘noise’ in human communication. The process of separating conversational speech and noise is termed voice activity detection (VAD). This paper describes a new approach to VAD which is based on the Wavelet ...

متن کامل

Non-Speech Segment Rejection Based on Chaotic and Prosodic Features

Speech recognition systems, to have an acceptable performance in noisy environments, could be provided with a preprocessing unit, a speech/non-speech detector. So on, the detected speech segments will be delivered to the speech recognition core. In this work, we introduce a modified speech/non-speech detection system based on prosodic features characterizing the smoothness of the peak index ser...

متن کامل

Phoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain

This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...

متن کامل

The effect of redesign workstation on Speech Interference Level (SIL) among bank tellers

Abstract Background: There is always an interaction between man and his environment that can be the cause of physical, physiological and psychological stress on people and also cause discomfort, annoyance, and have direct and indirect effects on their performance and productivity, health and safety. People in their workplace are exposed to many factors related to work activities and environmen...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012